2024-08-06 09:10:07.AIbase.10.8k
ControlMM: Multi-modal Input for Full-body Motion Generation from Text, Speech, and Music
ControlMM is an innovative technical framework developed jointly by the Chinese University of Hong Kong and Tencent, aimed at addressing challenges in multi-modal full-body motion generation. This framework supports multi-modal inputs from text, speech, and music to generate full-body motions that match the content. It employs the ControlMM-Attn module to process dynamic and static human topology in parallel, achieving efficient movement knowledge learning. A phased training strategy is adopted, progressing from text to motion pre-training, then to multi-modal control adaptation, ensuring the model's effectiveness under various conditions.